Skip to content

ITEP-32416 Add FP16 inference with feature flag #233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 22, 2025

Conversation

itallix
Copy link
Contributor

@itallix itallix commented May 16, 2025

📝 Description

This PR introduces FEATURE_FLAG_FP16_INFERENCE to control which model precision is used for inference operations. The change allows for more efficient resource utilization while maintaining backward compatibility.

Changes:

  • Added new feature flag FEATURE_FLAG_FP16_INFERENCE to control model precision selection
  • Updated model selection logic to prioritize models based on feature flag setting
  • Implemented fallback mechanism between FP16 and FP32 models
  • Ensured appropriate XAI-enabled models are exported based on selected precision
  • Added tests covering both feature flag states

Details

When enabled, the system will:

  • Use FP16 models for inference operations
  • Fall back to FP32 models if FP16 isn't available, or other way around
  • Export appropriate XAI-enabled models according to selected precision

This change optimizes resource utilization by deploying more compact FP16 models that consume less memory and storage while maintaining inference performance.

JIRA: ITEP-32416 ITEP-66504 ITEP-66505

✨ Type of Change

Select the type of change your PR introduces:

  • 🐞 Bug fix – Non-breaking change which fixes an issue
  • 🚀 New feature – Non-breaking change which adds functionality
  • 🔨 Refactor – Non-breaking change which refactors the code base
  • 💥 Breaking change – Changes that break existing functionality
  • 📚 Documentation update
  • 🔒 Security update
  • 🧪 Tests

🧪 Testing Scenarios

Describe how the changes were tested and how reviewers can test them too:

  • ✅ Tested manually
  • 🤖 Run automated end-to-end tests

✅ Checklist

Before submitting the PR, ensure the following:

  • 🔍 PR title is clear and descriptive
  • 📝 For internal contributors: If applicable, include the JIRA ticket number (e.g., ITEP-123456) in the PR title. Do not include full URLs
  • 💬 I have commented my code, especially in hard-to-understand areas
  • 📄 I have made corresponding changes to the documentation
  • ✅ I have added tests that prove my fix is effective or my feature works

@itallix itallix requested a review from a team as a code owner May 16, 2025 11:40
@itallix itallix marked this pull request as draft May 16, 2025 11:41
@github-actions github-actions bot added the IAI Interactive AI backend label May 16, 2025
Copy link
Contributor

@leoll2 leoll2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, minor comments

@itallix itallix requested a review from Copilot May 16, 2025 15:01
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a feature flag to toggle FP16 inference and updates model creation, selection logic, and tests to support FP16-first pipeline with FP32 fallback.

  • Introduces FEATURE_FLAG_FP16_INFERENCE and uses it in prepare_train and ModelRepo to choose precision order.
  • Renames mo_fp32_with_xaimo_with_xai across code, fixtures, and tests.
  • Updates ModelRepo.get_latest_model_for_inference* to fetch all matching precisions and implement FP16/FP32 fallback.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
interactive_ai/workflows/geti_domain/train/job/tasks/prepare_and_train/train_helpers.py Use feature flag to set FP16 or FP32 for XAI model
interactive_ai/workflows/geti_domain/train/job/tasks/evaluate_and_infer/evaluate_and_infer.py Swap mo_fp32_with_xai references to mo_with_xai
interactive_ai/workflows/geti_domain/common/jobs_common/features/feature_flag_provider.py Add FEATURE_FLAG_FP16_INFERENCE enum entry
interactive_ai/workflows/geti_domain/common/jobs_common_extras/mlflow/utils/train_output_models.py Rename mo_fp32_with_xaimo_with_xai in IDs/parse
interactive_ai/libs/iai_core_py/iai_core/repos/model_repo.py Update inference query to include both precisions and implement fallback logic
Tests and fixtures (multiple files) Rename fields/tests for mo_with_xai and cover both flag states
Comments suppressed due to low confidence (2)

interactive_ai/libs/iai_core_py/iai_core/repos/model_repo.py:450

  • Aggregation pipeline lacks a $sort stage to ensure the latest model is returned first; this can lead to selecting an older model ID when multiple precisions exist—add sorting by version or _id before projecting.
matched_docs = list(self.aggregate_read(aggr_pipeline))

interactive_ai/libs/iai_core_py/tests/repos/test_model_repo.py:431

  • [nitpick] Update this docstring to reflect the new FP16-first behavior when the feature flag is enabled, e.g. mention M6_FP16 as expected under fp16-enabled.
The latest model for inference is M4 (the first one generated after the base model).

@github-actions github-actions bot added the UI label May 19, 2025
…hen fetching models + add unit test for fallback model
@itallix itallix requested a review from Copilot May 19, 2025 13:58
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

The PR adds a new feature flag (FEATURE_FLAG_FP16_INFERENCE) to support FP16 inference and updates model selection logic, renaming the optimized model field from mo_fp32_with_xai to mo_with_xai throughout the code and tests. Key changes include:

  • Adding FP16 feature flag support in feature flag services and enum.
  • Updating model creation and selection logic to conditionally use FP16 based on the flag.
  • Refactoring tests and fixture data to reflect the renamed model field and validate FP16 and FP32 scenarios.

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.

Show a summary per file
File Description
web_ui/src/core/feature-flags/services/feature-flag-service.interface.ts Added FEATURE_FLAG_FP16_INFERENCE flag to the development features
interactive_ai/workflows/geti_domain/train/tests/unit/workflows/test_train_workflow.py Updated reference to mo_with_xai in tests after renaming
interactive_ai/workflows/geti_domain/train/tests/unit/tasks/prepare_and_train/test_train_helpers.py Added parameterized tests for feature flag-driven precision selection
interactive_ai/workflows/geti_domain/train/tests/unit/tasks/evaluate_and_infer/test_evaluate_and_infer.py Updated model field references in evaluation/inference tests
interactive_ai/workflows/geti_domain/train/tests/fixtures/train_workflow_data.py Updated fixture to use the new model field name mo_with_xai
interactive_ai/workflows/geti_domain/train/job/tasks/prepare_and_train/train_helpers.py Modified model builder creation to choose FP16 or FP32 based on the feature flag
interactive_ai/workflows/geti_domain/train/job/tasks/evaluate_and_infer/evaluate_and_infer.py Updated inference tasks to reference the new model field naming
interactive_ai/workflows/geti_domain/common/jobs_common_extras/mlflow/utils/train_output_models.py Refactored TrainOutputModelIds and TrainOutputModels to use mo_with_xai
interactive_ai/workflows/geti_domain/common/jobs_common/features/feature_flag_provider.py Added FEATURE_FLAG_FP16_INFERENCE to the enumerated flags
interactive_ai/libs/iai_core_py/tests/repos/test_model_repo.py Revised tests for the model repository to validate FP16/FP32 selection logic
interactive_ai/libs/iai_core_py/iai_core/repos/model_repo.py Updated repository queries and aggregation pipelines to prioritize FP16 via feature flag
Comments suppressed due to low confidence (1)

interactive_ai/libs/iai_core_py/iai_core/repos/model_repo.py:398

  • [nitpick] Consider verifying that sorting by _id ascending reliably reflects the creation order when selecting the earliest model for inference; if _id does not guarantee chronological order, you might sort using a dedicated timestamp field for better clarity.
"precision": {"$in": [ModelPrecision.FP16.name, ModelPrecision.FP32.name]},

@itallix itallix marked this pull request as ready for review May 20, 2025 07:21
@itallix itallix requested a review from leoll2 May 20, 2025 07:21
leoll2
leoll2 previously approved these changes May 20, 2025
MarkRedeman
MarkRedeman previously approved these changes May 20, 2025
@itallix itallix dismissed stale reviews from MarkRedeman and leoll2 via 5fdb21b May 20, 2025 10:14
@itallix itallix requested a review from leoll2 May 20, 2025 10:16
@itallix itallix removed the UI label May 21, 2025
@itallix itallix added this pull request to the merge queue May 22, 2025
Merged via the queue into main with commit c104709 May 22, 2025
22 checks passed
@itallix itallix deleted the vitalii/use-fp16-inference branch May 22, 2025 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IAI Interactive AI backend ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants